Skip to content

Push compute engine value loading for longs down to tsdb codec. #132622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

martijnvg
Copy link
Member

This is the first of many changes that pushes loading of field values to the es819 doc values codec in case of logsdb/tsdb and when the field supports it.

This change first targets reading field values in bulk mode at codec level when doc values type is numeric doc values or sorted doc values, there is only one value per document, and the field is dense (all documents have a value). Multivalued and sparse fields are more complex to support bulk reading for, but it is possible.

With this change, the following field types will support bulk read mode at codec level under the described conditions: long, date, geo_point, point and unsigned_long.

Other number types like integer, short, double, float, scaled_float will be supported in a followup, but would be similar to long based fields, but required an additional conversion step to either an int or float vector.

This change originates from #132460 (which adds bulk reading to @timestamp, _tsid and dimension fields) and is basically the timestamp support part of it. In another followup, support for single valued, dense sorted (set) doc values will be added for field like _tsid.

Relates to #128445

Given that the optimization is target to specific doc value fields that are produced by long field mappers, I experimented with the following query: FROM metrics-hostmetricsreceiver.otel-default | STATS min(@timestamp), max(@timestamp).
The metrics-hostmetricsreceiver.otel-default contains 270 minutes of metrics and has 221184000 docs and storage is 8.5gb. On my local machine, the query time without this change is ~180ms and with this change ~70ms.

Flamegraph without this change:
image

ESQL profiling of the query (and data_partitioning set to shard) without this change:

{
    "operator": "ValuesSourceReaderOperator[fields = [@timestamp]]",
    "status": {
        "readers_built": {
            "@timestamp:column_at_a_time:BlockDocValuesReader.SingletonLongs": 55
        },
        "values_loaded": 221184000,
        "process_nanos": 1155100354, <--- ~1155ms
        "pages_received": 10150,
        "pages_emitted": 10150,
        "rows_received": 221184000,
        "rows_emitted": 221184000
    }
}

Flamegraph with this change:
image

ESQL profiling of the query (and data_partitioning set to shard) with this change:

{
    "operator": "ValuesSourceReaderOperator[fields = [@timestamp]]",
    "status": {
        "readers_built": {
            "@timestamp:column_at_a_time:BlockDocValuesReader.BulkSingletonLong": 55
        },
        "values_loaded": 221184000,
        "process_nanos": 218763289, <-- ~218ms
        "pages_received": 10150,
        "pages_emitted": 10150,
        "rows_received": 221184000,
        "rows_emitted": 221184000
    }
}

This is the first of many changes that pushes loading of field values to the es819 doc values codec in case of logsdb/tsdb and when the field supports it.

This change first targets reading field values in bulk mode at codec level when doc values type is numeric doc values or sorted doc values, there is only one value per document, and the field is dense (all documents have a value). Multivalued and sparse fields are more complex to support bulk reading for, but it is possible.

With this change, the following field types will support bulk read mode at codec level under the described conditions: long, date, geo_point, point and unsigned_long.

Other number types like integer, short, double, float, scaled_float will be supported in a followup, but would be similar to long based fields, but required an additional conversion step to either an int or float vector.

This change originates from elastic#132460 (which adds bulk reading to `@timestamp`, `_tsid` and dimension fields) and is basically the timestamp support part of it. In another followup, support for single valued, dense sorted (set) doc values will be added for field like _tsid.

Relates to elastic#128445
@martijnvg martijnvg added the :StorageEngine/Mapping The storage related side of mappings label Aug 10, 2025
@martijnvg martijnvg requested a review from dnhatn August 11, 2025 01:19
@martijnvg martijnvg marked this pull request as ready for review August 11, 2025 01:19
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Aug 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

/**
* Specialized builder for collecting dense arrays of long values.
*/
interface SingletonBulkLongBuilder extends Builder {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to reuse this builder interface for other number field types too and even ordinal based fields. Given that at the codec level everything is stored as long[]. For other non long field types we need a conversion step, but that can happen in the build() method. For example converting to int[] by using Math.exactInt(...), which can be done a simple loop in build() method. So I don't expect us to introduce more interfaces here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a SingletonInt instead for ordinals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with generalizing the builder that makes sense.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow - more than a 5x speedup, impressive! Great changes; however, I think we should make them less invasive and more contained. Thanks, Martijn! I'm looking forward to seeing this PR merged.

bulkReader = new BulkReader() {

@Override
public void bulkRead(BlockLoader.SingletonBulkLongBuilder builder, BlockLoader.Docs docs, int offset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more consistent to implement BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException instead.

}

private static class SingletonLongs extends BlockDocValuesReader {
static class SingletonLongs extends BlockDocValuesReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enable the optimization in BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) of this class only?

public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException {
    if (numericDocValues instanceof ... r) {
        return r.read(factory, docs, offset);
    }
    ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should work as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took me a while, but this works out and it is now even simpler! 054b12e

@@ -1013,7 +1013,8 @@ public Function<byte[], Number> pointReaderIfPossible() {
@Override
public BlockLoader blockLoader(BlockLoaderContext blContext) {
if (hasDocValues()) {
return new BlockDocValuesReader.LongsBlockLoader(name());
var indexMode = blContext.indexSettings().getMode();
return new BlockDocValuesReader.LongsBlockLoader(name(), indexMode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pass indexMode? I think we can always enable optimizations if the underlying doc_values are dense and use our codec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point we can just check the implementation of numeric doc values. This should be sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed: be0c77c

/**
* Specialized builder for collecting dense arrays of long values.
*/
interface SingletonBulkLongBuilder extends Builder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a SingletonInt instead for ordinals.

@@ -498,6 +509,14 @@ interface IntBuilder extends Builder {
IntBuilder appendInt(int value);
}

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to SingletonLongBuilder and add support for appending a single long? I think we can use this builder when doc_values is dense, even if it's not from our codec. Also, we should consider extending LongVectorFixedBuilder to support bulking, but it's not an issue of this PR.

Copy link
Member Author

@martijnvg martijnvg Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the class: 2924c402368fd58bc13adea5a943d8afa2fda963

I think we can use this builder when doc_values is dense, even if it's not from our codec.

I think so too, we would need to check by: numericDocValue#cost() == maxDoc in BlockDocValuesReader.SingletonLongs?

Also, we should consider extending LongVectorFixedBuilder to support bulking, but it's not an issue of this PR.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed f097f4a, to use singleton long builder when we're dense even when not using es819 doc value codec.

@martijnvg martijnvg requested a review from dnhatn August 12, 2025 05:00
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left more comments, but we're close. Thanks, Martijn!

if (numericDocValues.advanceExact(doc)) {
if (numericDocValues instanceof BulkNumericDocValues bulkDv) {
return bulkDv.read(factory, docs, offset);
} else if (isDense) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's unsafe to use cost for this. Would you mind reverting this part? We can find a way to enable it later. We need to do something similar to FieldExistsQuery#rewrite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will revert.

I see that FieldExistsQuery#rewrite(...) relies on Terms#getDocCount() and for that we need an inverted index for that same field. I think doc value skippers can also be used. But let's figure this out in another change.

int remainingBlockLength = ES819TSDBDocValuesFormat.NUMERIC_BLOCK_SIZE - blockInIndex;
for (int newLength = remainingBlockLength; newLength > 1; newLength = newLength >> 1) {
int lastIndex = i + newLength - 1;
if (lastIndex < docsCount && isDense(index, docs.get(lastIndex), newLength)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this logic! Can we limit remainingBlockLength to the min of (ES819TSDBDocValuesFormat.NUMERIC_BLOCK_SIZE - blockInIndex, docsCount - i) to allow a single copy of the last block? Note that there could be an issue with this logic for Lookup Join and Enrich, as the same doc IDs can appear multiple times. For example, this logic might mistakenly treat [1, 1, 2, 4] as [1, 2, 3, 4]. However, both Lookup and Enrich indices don't use this codec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we limit remainingBlockLength to the min of (ES819TSDBDocValuesFormat.NUMERIC_BLOCK_SIZE - blockInIndex, docsCount - i) to allow a single copy of the last block?

Let me try this.

Note that there could be an issue with this logic for Lookup Join and Enrich, as the same doc IDs can appear multiple times. For example, this logic might mistakenly treat [1, 1, 2, 4] as [1, 2, 3, 4]. However, both Lookup and Enrich indices don't use this codec.

I will add a comment about this here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed: 6ca5c66

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove lastIndex < docsCount check?

public BlockLoader.SingletonLongBuilder appendLongs(long[] newValues, int from, int length) {
try {
System.arraycopy(newValues, from, values, count, length);
} catch (ArrayIndexOutOfBoundsException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for easy debugging :)

return docs[i];
try {
return docs[i];
} catch (ArrayIndexOutOfBoundsException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

@dnhatn dnhatn self-requested a review August 12, 2025 05:48
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for all iterations @martijnvg!

@@ -0,0 +1,5 @@
pr: 132622
summary: Add bulk loading of dense singleton number doc values to tsdb codec and push compute engine value loading for longs down to tsdb codec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the summary doesn't match with the PR title?

@martijnvg martijnvg enabled auto-merge (squash) August 12, 2025 07:21
@martijnvg martijnvg merged commit 66107f1 into elastic:main Aug 12, 2025
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Compute Engine Analytics in ES|QL >enhancement :StorageEngine/Codec :StorageEngine/Mapping The storage related side of mappings Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants